This Page Is Inserted by IFW Operations 
and is not a part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of 
the original documents submitted by the applicant. 

Defects in the images may include (but are not limited to): 

• BLACK BORDERS 

• TEXT CUT OFF AT TOP, BOTTOM OR SIDES 

• FADED TEXT 

• ILLEGIBLE TEXT 

• SKEWED/SLANTED IMAGES 

• COLORED PHOTOS 

• BLACK OR VERY BLACK AND WHITE DARK PHOTOS 

• GRAY SCALE DOCUMENTS 



IMAGES ARE BEST AVAILABLE COPY. 



As rescanning documents will not correct images, 
please do not report the images to the 
Image Problem Mailbox. 



NATURE VOL. 326 26 MARCH 1987 




-REVIEWARTICLE- 



347 



Knowledge-based prediction of protein structures 
and the design of novel molecules 

T. L. Blundell, B. L. Sibanda, M. J. E. Sternberg & J. M. Thornton 

U.o..o.orMo,ecu.arB,o,o,,.D,pan.en.o^a ^^ 

anngens in vaccine design, and novel molecules in protZ engZertg ' ' " " 



Llv r f «'*^"°P'"«"'s of industrial, clinical and 

agncu tural .mportance may be achieved in the coming years 
liLnT^l'T "''"^ interactions between macromoleculesTnd 
ligands that occur naturally in the living cell. For example the 
?rS f^e^- herbicides and pesticides may be improved 
from knowledge of the mteraction of a molecule with an isolated 
receptor enzyme or nucleic acid. The specificity, stability or 
activity of engineered hormones of clinical importance or eL^- 
TnowlLTe"'/" chemical industry may be improved from 
knowledge of protems ih general'. New peptide and protein 
vaccnes may also be designed from infoniadon on antibody- 

dS ^' K^" ""^ ""^^ electron^ 

dev ces may be constructed using novel molecules-perhaps 
protems-that aggregate in a predetermined way (as they do in 

Son i'"' "Tl^'t molecules'that conduct 

elertrons; these will be the new biological microchips^ 
rJZ- processes require information on the shape, charge 
chemical funaion and dynamical flexibility of at least two 
interacting molecules. In most cases, at least one of the molecules 
Zrl,.^.l protein-enzyme, receptor, immunoglobulin, redox 
ofltTi ^""J""^* ^""'^ <*epends on knowledge 

of de ailed three-dimensional structures defined by X-ray analy- 
sis, although medium resolution structures of small proteins in 
non-crystalline environments will increasingly come from two- 
dimensional NMR techniques^ in the future. Howeve™ our 
knowledge of three^imensional structures of protein defined 
by these methods has increased only slowly, in spite of the 
greater number of laboratories involved in this work 
1 J'JoT.T I ' ^'^r'", ^eeombinant DNA techniques has 
«l„tor,„^ °f 'I'fo^^'ion concerning the sequences of 
receptors and enzymes important for drug, herbicide and pesti- 
muf, « * «'e-specific and deletion 

whEr' °f ? P?t«bilities.of engineering new proteins 

funitfoW T f'' ^'"""''"^ ^"'l even with new 

functions . Very few proteins that have been sequenced and can 

«iree.dimens onal structures, and so rational approaches to 
destgn are difficult. However, computer technology has also 

irt^'v V ""P'" ^"'^ ^ome compensator? 

possibiUUes. These include the use of database technology to 
store retneve and compare the known sequences; computer 
graphics to display models and manipulate known three- 
dimensional structures*; and computer simulation to calculate 
low-energy conformers, normal modes and molecular 
dynamics . Together these new technologies offer the chance of 
exploiung our experimental knowledge of the three-dimensional 
structures of proteins in a rational approach to the modelling 
or protein interactions and the design of novel molecules 



In this review we emphasize our own view that modelling of 
proteins and their interactions is most usefully carried out using 
a knowledge-based approach. This depends on the identification 
of analogies between a protein that is to be modelled and other 
proteins of known three-dimensional structure at all levels in 
the hierarchy of protein organization: secondary structure 
motifs, domains and quaternary or ligand interactions. 

Sequence alignment 

The fim step in any modelling is to convert the DNA sequence 
availab e mto the primary structure of the gene product an^ 
search for sequence homologies with known sequences of other 
'"'^°™^''o" on DNA sequences is available in 
databanks such as those of the European Molecular Biology 

^esearch'^^^^w^'-^' "'^f '"'^S' °' Biomedill 
fBoh rL K IViaryland, United States, orGenBank 

in, J; ?*I^"f ^"^^ Newman Inc.) (4,000,000 bases and 5,000 
T^trl r T "''"«"<=es are collected together in the Protein 
Information Resource (PIR) databank at National Biomedical 

the USA ^""pfp^;- "'k''^''"'' °' '"^ NEWAT databank in 
he USA. The PIR databank now contains over 3,000 protein 

r TdnT"' "^^^ by transia 

ee ref ^ '^"'^^ of sequence databases, 

see ref. 8). The National Biomedical Research Foundation pro- 
vides some software to search their databanks, to compare 
user-specific segments with segments of the same lengthTn the 
database to align two sequences, to detect similarities and to 
score and display the degree of similarity 

Wunsy'*SeWo'"fr"' ^'e^ri'hmslof Needleman and 
wunsch , Sellers and Waterman etai" use dynamic proeram- 

nungmethodstocarryoutaglobalcomparjsonoft^^™ 
Their success depends very much on thi degree 6f s^mS 

oalr? '"k ^'^""^ ''«P«"'l'"g on'thelp™ena" J 

parameters chosen, even for closely related sequenfes^^". Fo^ 

XdSjlhe ho "T"" ^""^^ automatic procedures 

sequent A^!. ^ ^^- ' "'^ background of randomized 
sequences. As an approximate guide, if the alignment score is 
more than six standard deviations above that fof randomTg " 
Sd-T"'' '."''econdarystructures will becorrealy 

aligned . Because insertions and deletions often occur at the 

ments ca° ? T k'"' secondary structures, improve! 

ments can be made by introducing penalties for insert ons/dele- 
tions in or-helices or ;3-strands"•'^ v"cic 
For very distantly related sequences, homology may be restric 

the ^h"'' ""'""^^ °^ ^^^-^^"^ whose'seja" Pon 

along the chain may vary considerably between proteins TTiere 

s SoS* ^^r^*"'''"-'^ - 



348 



REVIEWARTICLE 



_NATURE VOU 326 26 MARCH 10it7 




Fig. I A Stereo view comparing the 
three-dimensional structures of two 
aspartic proteinases, endothiapcpsin 
and penicillopepsin, indicates that 
the central cores, and activc-sitc-cleft 
residues are closely similar. Diversity 
is achieved mainly in the loop regions 
that occur at the periphery of the 
bilobal enzymes. 



The tertiary structures of homologous proteins 

Comparison of the tertiaiy structures of homologous proteins 
shows that three-dimensional structures are conserved in evol- 
ution more than protein primary structures and considerably 
more than DNA sequences (ref. 19 for review). In recently 
diverged molecules the elements of secondary structure— a- 
hehces and ^-strands— are arranged in closely comparable 
three-dimens.onal topologies. Amino-acid replacements occur 
most often in surface positions so that the main chain conforma- 
lons are little affected^". More radical insertions and deletions 
tend to occur in surface loops between the secondary structure 
units although insertions are allowed in some ^-strands to give 

insulm family this divergence has occurred with complete con- 
servation of the hydrophobic core". 

However, in most protein families divergence is accompanied 
by changes in the hydrophobic core. This has been examined 
m an elegant series of papers by Chothia and Lesk^'-" who 
have shown that although the volume of the core remains 
approximately constant in many families, amino-acid replace- 
ments of hydrophobic core residues are usually accommodated 
not by complementary amino-acid substitutions but rather by 
'. J of secondary structural elements. Jhty have shown 
that in the a-helical globins large relative shifts of certain helices 
are observed but in the ^-sheet family of azurins/plastocyanins 
relative rotations of the ^-strands have occurred. To T^nZ 
extent this appears to be family dependent and is less evident 
not only in the immunoglobulins where an intersheet disulphide 
bndge pins them together but also in the core of other 8-sheet 
proteins such as aspartic proteinases (Fig 1) 

cor^'To'dlfiL'fi-"' in'the hydrophobic 

^ reJi^^ fn K r^"'««''^«'y. «he topological equivalence 
of residues in homologous proteins is identified by a least- 

Eat^'°?h '"f P'"'"'"" dimensions and 

calculates the root-mean-square (r.m.s.) differences of 
equivalent C, atoms A problem arises in the definition of which 
residues constitute the core. Chothia and Usk" first superposed 
^l^T * «'">^»«f /r^condary structure and then extended 

?/v7J« ""^^J^" r'*'"'^ long as the 

deviat,o,« did not exceed 3 A. Using this definition the -common 

>^^V H "'^ ^-^id" for Prote™rof 

~f A blfiT*^- r.m.s. deviation averaged 

1 A but there was a correlation between r.m.s. deviation and 
percentage residue identity of the complete protein 

This definition of the common core may not always be most 
Wnate. If the core is defined by those\esidu« whole^^e 

although the number of residues and the average r m s devi- 
auons are much smaller (Fig. 2). 

from*^™ ' differences between homologous structures arise 
en<^sr hiTff" ,f conformational difier- 

S,Th ? ^^y''*'""* environments. This can be 

^ho^itntT.'^T^"T <'':P'°'«'"'' of identical sequence 
wnose tertiary structures have been defined independently TTie 



hydrophobic cores may differ by only 0.2 A in well defined 
high resoluuon «1.8A) structures but for all residues the 
differences are l.kelv to be 0.3-0.6 A. Most significantly the 
differences can be 1 A for stnictures at intermediate (2 5-3 0 A) 
resolutions. j-vr^; 



Modelling by homology 



?,fi^L kT T"' ^'^^ three-dimensional structures 

defined by high resolution X-ray analysis, how should we pro- 
ceed to construct a model? Historically, this has been achieved 
in several simple stages starting with alignment of the sequences 
followed by creation of insertions, deletions and replacements' 
.n the known three-dimensional structure of the homologous 
protein This was first carried out in 1969 by Browne and 
co-workers m the constniction of a model of a-lactalbumin 
based on the three-dimensional structure of lysozyme which had 
been defined by X-ray analysis. Although it was achieved using 
physical models, subsequent attempts exploited computer 
FROnn^»'-^5* !'"''"°'°8y developed, especially the program 
FRODO . The initial models have been refined using energy 
minimization techniques to give final structures without steric 
clashes . Molecules modelled in this way have included 



° 1.00 
E 



lo.so 




0.00 



'OO.OO 80:oO io!oo Jo!oo 2o!or 

Pairs wHh Wentleal residues (%) 

ng. 2 A graph in which stars (.) indicate the root-mean-square 
(r.in.s.) separation of topologically equivalent residue (Ca) posi- 
fdZi'^ "r proteins plotted « a function of the degree of 
^T^^ eq"'««lent residues that are <3 A in separation when 
Ihe two proteins are arranged as a best fit. The best fitting curve 
IS given as a continuous line. Crosses (+) show residues whose 
s^de chains are <7% accessible to the solvent and the best-fitting 
cun^e^s given by a dashed line. TTie percenuige identities refef 
only to the residues compared. For any particular protein the 
f^^T^^" '"/"^ed but the r.m.s. separation' and 

residues are compared. However, apart from proteins of low 
homology, the relationship between the r.m.s. separation and 
sequence homology is similar in each anal^U 

(T, Hubbard, personal communication). 



NATTJtE VOL 326 26 MAPrw 



hSdr -^^ S-'''^'' serine proteinases" 

• ""f ^K'-'i'C.J^Partic proteinases such as 
renin and immunoglobulins"-^ (ref. 41 for review). 

Modelling using multiple structures 

oSy MMTe'Sr T »«'°'*-«> <l-tabank, of which 
omy 100 are non-homologous, there may well be more than 
one structure available to serve as a basis for modeZg 

models for packing of side chains and their solvent 
acowsibihties. TTiUs. calmodulin was constructed on the E 
aU« r' ^"''"'-"'inding protein and carp parvalbumin« 
jMAough each sequence had approximately!^ same sequence 
on^^y "7* .calmodulin, certain features of the mode?b«e^ 
fcUd modi ""^ ''«"8 

An alternative knowledge-based approach is to use simul- 
taneously but selectively the information from all the knXn 
tertiary structures of the homologous family. For exampletC 
Tfii'd ^'^^-'"-"'tif niammalian serine JroteinasSiS": 
defined by X-ray analyses. The first step is to align the sequent 
S alPs^J^'""'^ °' homologous proteins uw'ng^ 
S bv J r are evenly weighted; it is not satisfactory to 
fit by least squares cnteria all the tertiary structures to one of 
the family. One approach is to constrart a •framework' that 

^t^Zl '"'n'' """^ P"'"'^ 'hree-dimensionaTsJic^ o 
the topologically equivalent residues common to the famUy^' 
^e second step is to align the new sequence with tho e of the 
'°P<"°8i«"y equivalent residues o^ the 
framework- are equivalent in the sequence alignment TTii^ 
generally involves some realignment f^m comparisons of^he 
LTnl . "^""'^ " ""^y ^ bv hand or by using the 

KoLr*'"""'=* '""i'"^ of Taylor'< Fragments'^,? ealh of 
.^1 J^H '^'oses' in sequence are 

selected; generally it is preferable to select fragments which io[n 
one secondary structure to another and if pSe Hnk „n^ 

ba^^of XX ""Tm^ plasminogen activator on the 
v!fi l • ! *«"-»vailable trypsin, elastase. chymotrypsin and 
M^ern structures, seventeen fragments were used CRg 3) A 
^"S-rrS" '^"''"'^ independent.; by' u^vi^ 

Insertions and deletions in loop regions 

of significant differences in protein struc- 

«™c. l*^" "t"""' -""St difficult to <^n- 

struct. The choice of a conformation found in a looD^of 

eSfh"i '""^ « Jl^mologous protein is often a good guide 
mi^leLdfnl^S"!?^ If."'" I'^oug" this will occ^ional? be 
tTfv lool in, ■ problem remains how to iden- 

dSe If °^ ^here conformations 

aiHer. If there is no sequence of equivalent leneth in anv 

1 h,iir Ti'^ for loops boiW^m 

•mmn, »itl, ai.c.oAdoSt^'lf^i^'^"'"" 



REVIEW ARnCLE- 




o!f;he btiT^fou^ -i^"' plasminogen activator (/) constructed 
on the basis of four other mammalian serine proteases of defined 



sSnfal^^f""'"''*' ^1} identifying the loop length, secondly 

The power of this technique is shown in models of mouse 

the llT 'r"i Z'"'' ^"y"'*'^'"" ^ere modelled on 
the basis of endothiapepsin. For example, sequences of 
equivalent loops in endothiapepsin and chyme in are gi^n as 



Residue number 
Endothiapepsin 
Chymosin 



197 
Y 
V 



198 
A 
T 



199 
V 
I 



200 
G 



201 
S 



202 
G 
V 



203 
T 
V 



204 



rtfdue tvneTl'7 endothiapepsin this is a two 

seauen J of K ™ '^'^^e^oinc at the first position. TTie 
inS« ,hLl^?"" ^ 8'^"='"' '"e second position 

exlmol. , H,. " ^ 'yP« 'n a second 

example a deletion of one residue is observed. Thus: 

Endiih- ^* 240 241 242 243 244 245 246 247 

Endothiapepsin A K S S _S__S_VG o Y V 
Chymosin A T Q N O Y _ g p ^ 



In endothiapepsin this is a standard four-residue loon with a 
^ycjne at position 4. In chymosin there is a deSof one 
residue leading to a three-residue loop, but the prwence of 

Ifrr^r'"* '^'^ '"'•y one hydrogen bond to gWe a 
standard five-residue loop with a glycine at position 4?Ffr5) 
A similar approach has been adopted by Jones and S^^p^- 

mih /k ""f '""P" "'■'^'^ «=°^ect length using an eS^n! 
me hod based on the distance between residues difp?ryedTa 

ina chmg of distance matrices. Alternatively the appropriate 
Tub"':''. «»"^°™^'-- be identified in aSonl! 

.^^l^***^* ."° structure appropriate for the modelling exists in 
confomations of proteins defined by high resolution X 
analysis, we must adopt an altemative'at f„/rapptTch /u"^ 



NATURE VOL. 326 26 MARrH ioa-t 



Tifale 1 Systematic mod elling of ^-hairpins 

SYSTEMATIC- MODELLING OF 8-HArRPINS 
REPLACEMEOT _ 




conformations that may be preferred by particular ammTacid sea^encL^h ^ ^ave identical numbers of residues and represent differing 
represent hain,i„ conformations with differing numberoT^^^ modelling replacements. Structures in column! 

bond implies a horizontal movement to the right (3:3-3-5) whereas toss or%.7„n^ "■odellmg insertions or deletions. Loss of the top hydrogen 
belongs to the same class. Preferred sequena patterns are irdi^t^H Sa'n of two hydrogen bonds implies a change such as 3 :5h. 7-9 which 



Zo^<^rt '^'TT ^"""'•'^ ="E0rithm« or a systematic search 
dyS"««. '""^^ conformers using molecular 

thaSrhT'"" I'*' ''PP™P"''"= backbone, the amino acids 
O^at differ between known structures and the modelled structure 

oSilr,^ •■ 'he side chain 

?=.^T f '•'8'°" three-dimensional space in the 

So°n r?"'" -'' °' '■"^ '^P'acement'^of residues 

alent n^«r^ °^ "chains at topologically equiv- 

Energy minimization and molecular dynamics 

sequence homology of 50% or more 
^rohThlv'^ predicted by the methods desS^ibed here wi?I be 
chains Lv^n'r ^""""^h individual ide 

bV ZZZ W ■ ^"""^ hut not all the errois can 

dlrd r„ of the potential energy using stan- 

^rd programs such as AMBER" or CHARMM^" WhereL 
X-ray analysts g.ves a time- and-space-averaged s,ruc,re H 



C|ystalhne lattice with perhaps 40% solvent, energy minimiz- 
ation gtves the structure at a single minimum usually in vacuo 
as solvent is not often simulated. Efforts to simulate the aqueous 
environment using Monte Carlo methods and perhaps to model 
specifically tightly bound waters- are particularly he^ful fo 
surface side chains although some contraction of the protein 
volumes tends to occur'' unless the molecule and solvent are 

Ibkl'v ""r"? of energy minimization in 

this way finds only a local mmimum and may be expected to 
be useful if the errors are relatively small (usually <1 A) 

To explotc;local changes requiring larger shifts, methods of 
constrammg Jhe major part of the structure whilst allowing 
freedom in the region of interest have been explored". With 
the extent ofjhe calculation reduced, side-chain torsion ingles 
can be vaned to locate conformations- for subsequent energy 
minunization. Alternatively, energy calculations or moleculi- 
dynamics may often be useful to explore a range of local con- 
formations, as developed by Robson and co-workers" Enerev 
minimization used in the conventional way will not be useful 
IdeM^ energy minimum can always be found. No one has 
modelled successfully changes such as those identified by Lesk 



NAn/RE VOL. 326 26 MARCH loa-r 



Fig. 4 A stereo view of 
three similar /3-hairpin 
structures (of type 3 : 5, see 
Table 1) superposed with 
ron^. error (calculated for ' 
the five residues in the 
loop) of 0.56 A. The 
examples given arc from 
penidUopepsin (APP: 
residues 72-82), car- 
boxypeptidase A (CPA: 
residues 38-48) and 



-REVIEW ARTICLE- 



lysozyme 



(LZM: 




residues 17-27). 

and Chothia" in the globins where an a helix can differ in 
p«smon by 7 A and rotational angle by 30-. The best chance 
appears to be in a simplified molecular dynamics procedure 
With ng,d secondary structural units, or an interactive docking 
procedure such as that of Wodak et aL". 

H w correct are the models? 

T^f.H'f^^^'tF'T''"' subsequently by 

X-ray analysis. The first of these models, that of a-lytic pro- 
temase, was flawed by an incorrect sequence alignmenl 'Others 

,tr^L. r T", '^"f^^'^"'- «'«°P'«' 'he three-dimension^ 
structures of rat y3-l ciystallin and the orthologous protein ylV 
from calf were modelled on the basis of rII-crysl^[H„« 
three^,mens.onal structure of the calf ylV defined by X-ray 
analysis at 2.3 A resolution has been shown to have a r.ms 

c,r7^^ ■ ? I ? " modelled the variable domain 

of .mmunoglobulm and have correlated their prediction with 
the recently defined X-ray analysis of Poljak and co-worked*' 

KTeTh" """m """-onnations correctly 
m 4 of the 6 hypervanable regions, but also attained a r.ms 

XT. f «gi°n for the main chl 

r,rr^n .h f r '1"°^'="'=<1 <"> «he basis of one homologous 
thTr™,Hff percentage sequence identify to 

the r.m.s. difference given by Chothia and Lesk is a reasonable 

r e^er hu^ma~- ^ ^"^^ "ly^S ed 

r^t r equme^-chains of haemoglobin has an 

r.m s. difference of 1.40 A from the structure defined by X-rav 
analysis. TTie precision can be improved by fitting the model ed 



struaure to the framework derived from all the homologous 
stnictures This leads to a smaller r.m.s. difference of 1 Is A 
between the model and the X-ray structure" when the a- and 
^-chains of equine and human haemoglobins are used 

Although the absolute value of potential energy of the model 
as calculated by programs such as CHARMM is not a good 
guide to Its correctness", the distribution of the hydrophobic 
side chains and the nature of the solvent-accessible surface are 
more sensiuve indicators; these are features that can be quan- 
tified but are also easily assessed visually. For example, a buried 
charged side chain that is not hydrogen-bonded is almost cer- 
tainly an indication that the model is incorrect 

Perhaps the most useful indication of the correctness of a 
model IS Its ab. ity to predict a molecular interaction that can 
be experimentally tested. Models of insulin-like growth factons" 
demonstrated that these molecules might bind insulin receptors 
Electrostatic potential surfaces generated for the model of cal- 
modu ,n have suggested a binding site consistent with experi- 
mental data for basic amphiphilic a-helical peptides" The size 
and chemical nature of the predicted combining sites of anti- 
lysozyme monoclonal antibodies have been tested and are con- 
sistent with epitope boundaries-'. TTie size, shape and position 
of the specifiaty pockets of human renin""" have been isessed 
as useful predictors for the design of inhibitors. 

Future developments 

There are two challenges for the future. First, to extend the 

founrf h'°. "If" ?° homology may be 

found, but where the unknown structure is probably constructed 



Fig. 5 A model of chymosin residues 238-248 which fnrm = 
/«-ha.Tri„ (broken line). TTie model is baseTon «^ hjf^^^^ 
refined suiiaure of endothiapepsin (continuous S Tlrfe 
r«idue deleuon in the chymosin structure results in a remodc^^n' 
of the family 4:4 ^-hairpi? of endothiapepsin ^ a "f 
hairpm in chymosin. 



lu. 



from the structural motifs, ^a^ units, hairpins, an 8-fold 
barrel such as found in triose phosphate isomcrase*' or a 
jelly-roIl structure, such as is found in the viruses**. The problem 
IS to identify these moUfs from the sequence through detailed 
study of the critical sequence patterns for each motif and to 
estabhsh the sequence fingerprint against which a sequence of 
unknown structure can be compared. 

For example, a particular pattern of a glycine and a serine 
along with a few conservatively varied residues and hydrophobic 
regions in a sequence of -40 residues has been used to identify 
a Greek-key structure in the surface protein S of the sporulating 
bacterium Myxococcus xanthus even though there is no sig- 
nificant overall homology". Another pattern Gly-X-Gly-X-X- 
Gly, where X is any amino acid, has been useful for identification 
of an a/^ structure in the p21 protein** and in tyrosyl kinases*^ 
Second, we must use the methods to design novel molecules. 
For protein engineering replacements, small insertions and dele- 
tions, the approach is a natural extension of the modelling 
procedures. Sequences in similar structural motifs can be iden- 
tified from known protein structures and grafted onto the known 
structure. Local conformations can be explored by global 
searches, restrained lUinimization approaches or molecular 
dynamics followed by energy minimization. In the design of 
hybnd molecules for example linking monoclonal antibody Fab 
fragments to enzymes such as proteases for dissolving blood- 
clots, or to toxins for killing tumour cells, the identification of 
stable domains and the design of linker regions will be of central 
importance . In the design of novel tertiary structures which 
might be endowed with special binding or catalytic functions 
the complete technology evolved for modelling will be essential* 
In drug design the modelling of the receptor must be comple- 



-REYIEWARnCLE- 



NATURE VOU 326 26 MARCH 1957 



mentcd with the definition of the ligand-binding interaction. 
Much IS to be learnt from interactions in homologous systems 
such as otiier members of the enzyme family. Thus, a starting 
point for modelling human angiotensinogen interactions with 
renin is usefully obtained from high resolution X-ray studies of 
inhibitors with oUier aspartic proteinases such as endothiapep- 
sm • . Also. deUiled analyses of side chain-side chain and side 
chain-hgand interactions can define preferred orientations for 
example of phenylalanine rings witii oxygens'^ sulphur^* and 
other aromatic groups^^ 

For vaccine design, tiie syntiietic molecule must mimic tiie 
Mitigen to stimulate the appropriate immune response"-''* 
Optimal activity will be achieved by designing tiie syntiietic 
molecule to mimic the critical features of the native antigen as 
closely as possible. As in drug design it may be advantageous 
to restrict flexibility, for example by ring closure. Alternatively 
an anUgcmc loop peptide may be grafted from one protein into 
another using the technique of protein engineering. Indeed it 
may be possible to produce a protein vaccine tiiat incorporates 
antigenic peptides from many proteins and confers multi-valent 
immunity. For rational design of such complex vaccines and 
even simple epitopes, model building using powerful computer 
graphics including superposing, annealing and energy minimiz- 
ing structures, will provide starting points to guide tiie 
experimenter. 

We thank Willie Taylor, Ian Tickle, Suhail Islam. Michael 
Sutcliffe.Tim Hubbard. David Bariow, Dee Atwal, Ilyas Haneef 
Andrew Hemmings, James Milner- White. Geoff Barton Mark 
Edwards, Marketa Zvelebil, Jus Singh and Stephen Bryknt for 
discussion of the ideas discussed in this review. We thank tiie 
SERC, ICI, CellTech. Glaxo and Sturge for financial support 



23. 
24. 
25. 
26. 
27. 
28. 
29. 
30. 

31. 
3Z 
33. 
34. 
35. 
36. 
37. 
'38. 
39. 



I. Blundeil. T. L rt oi PWt Traia, R. Soc A 317. 333-344 (1986) 

4 W«n«*r * w ^"^tW"' ^"^"^^Sa UiA 82, 1874-1878 (1985). 
4. Wagner. G. A Wuthnch. IC / mol^ BioL 155, 347-366 (1982) 

Winter. G. A Fersht, A. R. Trends BiotedinoL 2, 1 15-119 {I9$4) 

Richards. W. G. (cd.) / molec Graphic 4, 1-73 (1986) 

Brooks. B. A lUrptus. M. PtxK. natn. Acad. Sci (/.SLA M, 6571-6575 (1983). 

Kjieale, G. G. A Bishop, M. J. Cabios Rev. 1, 11-17 (1985). 

Needleman, S. B, A Wunsch, C D. / moiec BioL 443-453 (1970) 

Sellers, P. / appl Math. 26, 787-793 (1974) 

rSTm^A^.* c ^ ^"'^ 3«7-387 (1976). 

Fitch, W. M. A Smith, T. F. Proc natn. Acad. Sci US.A. SO. 1382-2386 
Barton^ G. A Sternberg, M. J. E. P^tein Engng (i„ the prS). ' 
Usk, A.. Levitt. M. A Choihia, C Protein Engng I, 77-78 (1987) 
Pooad. W. B. A Kanehisa, M. I. Nucleic Add, Res. 10, 247-263 (1982) 
Sellers, P. Proc natn. Acad. So. U.S.A. 76, 3041-3044 (1979) 
Boswell D. R. A McUchlan, A D. Nucleic Adds Res. 12. 457-464 (1984). 
Taylor, W. R. / motec BioL 188, 233-258 (1986) 

Bajaj. M. A Blundeil, T. L A Biophys. Bioengng 13, 453-492 (1984). 
Greer, J. / molec BioL 1531. 1027-1042 (1981) 

*"S?2r578 0978k"'*'"' ^- °- * Richardson. D. C. /Vac natr. Acad, ScL t/.SA 75, 

if M^fVi" ^'T^^^JTr ^^^^^ of Biological Interest (ed. Dod«>n, 
0. 501-508 (Clarendon Press, Oxford, 1981). . 
Lesk, A M. A Chothia, C / mo/ec BioL 136, 225-270 (1980) 
Lrak. A. M. A Chothia, C 7. molec BioL 160. 325-342 (1982) 
Chothia, C. A Lesk. A M. / motee. BioL 160, 309-323 (1982) 
Chothia. C A Lesk. A M. EMBO J. 5. 823-826 (1986) 
Hubbard, T. A Blunddl. T. L. /VtML Engng (lubmitted)! 
Browne, V/, J. et aL J. molec BioL 120, 97-120 (1969) 
Jones, A T. / appL Cryst II, 268-272 (1978). 

Isaacs. N. et aL Nature 271, 278-281 (1978). 
Bedartar. S. et aL Nature 270, 449-451 (1977) 

Blundeil. T. U. Bedartar. S. A Humbcl, ILE. Ptoc natn. Acad. Set U i A 75. 180-184 r 1978) 
Travers. P. et at Nature 310, 235-278 (1984). ^ 18W-IB4 (1978). 

Bhmdell, T. U. Sibanda, B, L A Peari, L H.' Nature 304. 273-275 f 1983^ 
Sibanda. B. L. et aL FEBS Utt 174. 103-1 1 1 (19847 
Cartson, W-, Karplus. M. A Haber. E. Hypertension ^ J3_26 (1985) 
Akahooe. P. ei ai Hypertension 7, 3-12 (1985) 
Chothia. C. et aL Science 233, 755-758 (1986). 



45 
46 

47, 
48. 
49. 
50. 
51. 
S2. 

53. 
54. 
55. 
56. 
57. 

58. 
59. 
60. 

61. 
62. 
63. 
64. 
65. 
66. 
67. 
68. 



. La Paz, P., Sutton, B. R., Darsley, M. J. A Rees, A R. EMBO J. S, 415-425 (1986) 
. Ripka, W. C. Nature 321. 93-94 (1986). 

. 0;Neil. K. A De Grado. F. Proc natn. Acad. Sd U.S.A. 82, 4954-4958 (1985) 
. Sutcliffe, M. A HanecC li.In/JQ, Prot CrysL 18, 1 1-18 (1986) 

^^^^^ Sieiccki. A R. A James, M. N. G- Biochemistry 22, 4420-4433 

(1983). 

RMd. R. J.. Brayer, G. D.. Jaraset, U A James, M. N. G. Biochemistry 23, 6570-6575 (1984) 
Stbanda, B. L. A Thornton, J. M. Nature 316, 170-174 (1985) 
Edwards, M., Sternberg, M. J. E. A Thornton, J. M. Proc Engng (in the press) 
Moult, J. A James, M. N. Proteins 2, 146-163 (1986). 
Milner- White. J. A Poet, R. Biochem. I 240, 289-292 (1986). 
Stbanda, B. U thesis, Univ. London (1986). 
Jones, T. A A Sinip, T. EMBO I 5, 819-822 (1986). 

Crippeo, C. M. in Chemometric Research Studies Series Vol. 1 (ed. Bandan. D ) 1-58 f Wilev 

New York, 1981). ' ^ 

Weincr. S. J. et aL J. Am. chem. Soc 106, 765-784 (1984). 
Brooks, B. R. / Computation Chem. 4, 187-217 (1983). 

Shih, H. L. Brady, J. A Karplus, M. Proc natn. Acad. Sd U.S.A. 82. 1697-1700 (1985) 
Hagler. A T. A Moult, J. Nature 272, 222-227 (1979). 

■ * Rcnneboog^uilbin. C, / motec BioL 181, 317-322 

(1984). 

Robson. B. A Piatt, E, / molec BioL 188, 259-281. 
Summers, L.D.etaL Eqrf Eye Res. 43, 77-92 (1986). 

DriMsen, H. P. C. A White. H. in Molecular Replacement (ed. Machin, P. A) 27-32 

(Daresbuiy Laboratory. Daresbuiy. 1985). 
Amit, A G, ei flt Sdence 233, 747-753 (1986). 

Novotny, J., Bniccoleri, R. A Karpltts. M. /. molec BioL 177, 787-818 (1984) 
Banner et aL Nature 255, 609-614 (1975). 
Abad-Zapatero, C, et aL Nature 286, 33-39 (1980). 
Wistow; G., Summers. U J. A Blundeil. T. L Nature 315, 771-773 (1985) 
Wiennga. R. K. A Hoi. W. O. Nature 301, 842-844 (1983) 
SteinbetB. M, J. E. A Taylor, W. R. FEBS Lett 175, 387-392 (1984). 
Neubeiger, M. S., WUliami. G. T. A Fox. R. O. Nature 312, 604-608 (1984) : 

^uT'ii^^^^TiMS)^ ^^'^ °' ^ "^"^ ^' * ^ 

^^"^^Ul^"^ ^' ^' ^' * ^i^ P^ natn, Acad. Sd 79. 

Rcid^ K. S. C, Undley. P. F. A TTiomtoa, J. M. FEBS Utt 190. 209-213 (1985). 
Singh. J. A ThomtOD. J. M. FEBS Lett 191. 1-6 (1985) ^ ^" ^ '^'^^f- 

^'^^^'f\'^tJt.l^Vf^^ ^ FounA,ri«; Symposium 119 (ed, Porter, R. A 

, whelan, J.) 150-163 (Wiley, Avon, 1986) 

^* f ^ "f?^"n^i;^ iJ"""- '° "^"^ Pou'^tion Symposium 1 19 (eds Pdrter. 
R. A Whelan, J.) 130-149 (Wiley, Avon, 1986). 



